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The comparison of 16S rRNA gene sequences is widely used to differentiate bacteria; however, this gene can lack resolution 
among closely related but distinct members of the same genus. This is a problem in clinical situations in those genera, such as 
Neisseria, where some species are associated with disease while others are not. Here, we identified and validated an alternative 
genetic target common to all Neisseria species which can be readily sequenced to provide an assay that rapidly and accurately 
discriminates among members of the genus. Ribosomal multUocus sequence typing (rMLST) using ribosomal protein genes has 
been shown to unambiguously identify these bacteria. The PubMLST Neisseria database (http://pubmlst.org/neisseria/) was que- 
ried to extract the 53 ribosomal protein gene sequences from 44 genomes from diverse species. Phylogenies reconstructed from 
these genes were examined, and a single 413-bp fragment of the SOS ribosomal protein L6 (rplF) gene was identified which pro- 
duced a phylogeny that was congruent with the phylogeny reconstructed from concatenated ribosomal protein genes. Primers 
that enabled the ampUfication and direct sequencing of the rplF gene fragment were designed to validate the assay in vitro and in 
silico. Allele sequences were defined for the gene fragment, associated with particular species names, and stored on the PubMLST 
Neisseria database, providing a curated electronic resource. This approach provides an alternative to 16S rRNA gene sequencing, 
which can be readily replicated for other organisms for which more resolution is required, and it has potential applications in 
high-resolution metagenomic studies. 



Rapid and reliable identification of bacteria is fundamental to 
experimental microbiology, particularly in clinical settings 
where it is frequently necessary to distinguish organisms which are 
genetically very closely related but which have stable and distinct 
disease phenotypes. A good example is the genus Neisseria, which 
comprises mostly commensal inhabitants of the mucosal surfaces 
of humans and animals but includes two significant pathogens, 
Neisseria gonorrhoeae, the gonococcus, which causes gonorrhea, 
and Neisseria meningitidis, the meningococcus, which can cause 
meningitis and septicemia. As the meningococcus is an "acciden- 
tal pathogen," which is frequently carried but rarely invasive, spe- 
cies identification is particularly important in community studies, 
where meningococcal carriage rates are estimated in the presence 
of related species which are not easily distinguished using conven- 
tional methods. This is especially important when vaccines are 
being introduced, such as the recently developed protein-polysac- 
charide conjugate serogroup A vaccine (PsA-TT; MenAfriVac) 
(1). Conventional phenotypic identification of bacteria is time- 
consuming and difficult to deploy, especially in resource-limited 
settings, and may suffer from errors in interpretation leading to 
misidentification. 

For isolate characterization purposes, approaches based on 
DNA sequencing offer accuracy and reproducibility with the ad- 
ditional advantage that the data generated can be transferred elec- 
tronically and stored on public databases. For many years, se- 
quence analysis of 165 rDNA, encoding 165 rRNA (ribosomal 
DNA sequencing), has played a principal role in this endeavor. In 
this approach, part or all of the 165 rRNA gene is sequenced, and 
identification is achieved by comparison of this sequence to cu- 
rated sequences on web-accessible databases (for example http: 
//www.ridom.de/rdna/ [2] and http://eztaxon-e.ezbiocloud.net/ 
[3]). The 165 rRNA gene has been a valuable target as it is ubiq- 
uitous and composed of both conserved and variable regions, al- 
lowing the design of universal PGR primers to generate nucleotide 



sequences that can be used to differentiate among isolates. The 165 
rRNA molecule is so conserved, however, that very similar or 
identical sequences are frequently present in more than one spe- 
cies which have distinct and stable phenotypic properties (4-6). 

Recently, ribosomal multilocus sequence typing (rMLST) (7) 
has been proposed as a method which provides an additional ra- 
tional and universal approach to species classification. This ap- 
proach exploits the availability of whole-genome sequence (WGS) 
data by indexing variation at the 53 genetic loci encoding the bac- 
terial ribosomal protein genes. This method has been shown to 
unambiguously determine the species identity of Neisseria iso- 
lates, demonstrating good congruence with both whole-genome 
analyses and more conventional approaches (4). These data indi- 
cated that some species had been misidentified using conventional 
methods, and that minor changes in nomenclature were required 
(8). The rMLST approach, however, requires nucleotide sequence 
variation data at 53 loci and, although these are readily extracted 
from WGS data, such information is not always economically or 
practically available from all specimens. Therefore, the loci in the 
rMLST scheme were examined to identify a gene fragment from a 
single locus that can be used to rapidly identify^ Neisseria species in 
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both the diagnostic and research settings. The target identified, a 
413-bp fragment of the 505 ribosomal protein L6 (rplF) gene, 
includes both conserved regions suitable for primer design and 
variable regions to distinguish sequences from different Neisseria 
species. Comparison of the rplF gene fragments provided suffi- 
cient discrimination to identif)^ most species within the genus ac- 
curately, rapidly, and inexpensively. 

MATERIALS AND METHODS 

Isolates and genome sequences. Nucleotide sequences were obtained 
from 44 genomes which were part of the data set used to validate rMLST 
in Neisseria (4); a different set of 44 Neisseria DNA samples (a gift from 
Bachra Rokbi, Sanofi Pasteur, Marcy I'Etoile, France), which were used to 
validate the assay using Sanger sequencing (see Table SI in the supple- 
mental material); and 839 publicly available genome sequences down- 
loaded from the PubMLST Neisseria database (http://pubmlst.org 
/neisseria/) (9), including those deposited as part of the MRF Meningo- 
coccus Genome Library (www.meningitis.org/research/genome). AH isolates 
analyzed are listed in Table S2 in the supplemental material, including 
culture collection isolates and the type strains of Neisseria polysaccharea. Neis- 
seria cinerea. Neisseria lactamica. Neisseria subflava. Neisseria mucosa. 
Neisseria oralis. Neisseria weaveri. Neisseria bacilliformis. Neisseria dentiae. 
Neisseria shayeganii. Neisseria canis. Neisseria wadsworthii. Neisseria ani- 
malis, and Neisseria elongata and the type strains of the previous species. 
Neisseria sicca. Neisseria macacae, and Neisseria flavescens (8). 

Extracting and analyzing sequence data from the PubMLST Neisse- 
ria database. Nucleotide sequences from the 53 concatenated ribosomal 
protein genes used in rMLST, the seven housekeeping gene fragments 
used in MLST, individual ribosomal protein genes, and the 16S rRNA 
gene were extracted from the PubMLST Neisseria database (9). Individual 
allele designations were also extracted from the database. Sequences were 
aligned with Muscle version 3.7 (10), and MegaS (11) was employed to 
reconstruct phylogenies using the neighbor-joining method. Genetic dis- 
tances were determined according to the Kimura two-parameter model 
(12), with all ambiguous positions removed from each pairwise sequence 
comparison and bootstrap values (13) based on 1,000 replications. DNA di- 
vergence between sequences was calculated using DnaSPS (14), with fixed 
nucleotide sequence differences defined as sites at which all of the sequences 
in one sample are different from aU the sequences in a second sample. 

Nucleotide sequence determination. The rplF fragment was ampli- 
fied using the PGR primers rplF-F (5'-CAGTGAGTGTTGGGGGTGGTG 
T-3') and rplF-R (5'-AGGYTGAGGAGKWCGGAAHG-3'), which were 
designed using the primer-BLAST tool (15) available from the National 
Genter for Biotechnology Information (http://www.ncbi.nlm.nih.gov/) 
and MEGA5 ( 1 1 ) . For PGR amplification of the rplF gene fragment, reac- 
tion mixes were incubated for 35 cycles; each cycle consisted of 95°G for 30 
s, 55°G for 30 s, and 72°G for 1 min. PGR products were purified using a 
precipitation method (16) and the nucleotide sequences of the purified 
PGR products were determined on each DNA strand using the primers 
described above by cycle sequencing with Applied Biosystems BigDye 
ready reaction mix (Life Technologies), used in accordance with the man- 
ufacturer's instructions. Sequence termination reaction products were 
separated and the sequence data collected using an Applied Biosystems 
3730 DNA analyzer (Life Technologies). Nucleotide sequence data from 
forward- and reverse-strand electropherograms were assembled into 
single contiguous sequences using SeqSphere (http://www.ridom.de 
/seqsphere/) and checked using the Staden suite of programs (17). 

Defining rplF fi-agment alleles and associating with species. The da- 
tabase was seeded with the first rplF fragment allele identified (arbitrarily 
assigned allele 1), and all genomes in the PubMLST Neisseria database 
were searched against this sequence (scanned) for the rplF fragment allele 
within the BIGSdb software using the BLAST algorithm (18). All variants 
with distinct nucleotide sequences were assigned unique allele designa- 
tions. Each allele was also assigned a genospecies association, based on 
rMLST species designations (4), with type strains used to confirm these 



designations, where available. If genome sequences for type strains were 
unavailable, seven locus MLST data were used to confirm species identity 
(19). A reference table of alleles with associated genospecies was con- 
structed within the PubMLST Neisseria database, which can be used to 
compare rplF fragment sequences to aid species identification. If an allele 
was obtained from a type strain or had an associated rMLST profile, the 
genospecies was considered confirmed; if not, it was considered provi- 
sional and labeled as such within the database. 

RESULTS 

Phylogenetic analysis of ribosomal protein genes. For the 44 

Neisseria isolates for which WGS were available, phylogenies were 
generated from the 53 concatenated whole-ribosomal protein 
gene sequences used in rMLST and for each of the 53 ribosomal 
protein genes individually. These were compared to identify the 
single-locus tree that was most congruent with the 53-locus tree in 
terms of clustering the different taxa. The rplF gene phylogeny 
clustered the sequences consistently with the rMLST tree, and this 
locus was chosen for further analyses as it was of sufficient length 
and variability, with conserved flanking regions suitable for 
primer design. Sanger sequencing of the rpZf gene using two prim- 
ers designed from sequences extracted from the WGS data pro- 
duced a nucleotide fragment of 413 bp, and this determined the 
length of the rplF fragment alleles for the assay. A phylogeny re- 
constructed from the rplF fragment alleles exhibited the same spe- 
cies clusters as the phylogeny produced from the 53 concatenated 
ribosomal protein gene sequences used in rMLST (Fig. 1). 

rpIF allele fragment variability. A total of 27 rplF fragment 
alleles were identified among the set of 44 isolates used to validate 
the rplF assay in vitro, which included 10 Neisseria species (see 
Table SI in the supplemental material). An examination of the 
allele sequences from these samples suggested that some isolates 
had been misidentified. For example, ATCC 19243, originally 
classified as N. suhflava, has been identified as N. mucosa using 
rMLST. For some isolates, WGS data were unavailable and dis- 
crepancies were resolved by examining MLST loci. Of five isolates 
with rplF fragment allele 40, one had been previously identified as 
N. suhflava, whereas four had been identified as N. sicca; however, 
they had almost identical MLST profiles, differing at only one or 
two loci and clustered with N. suhflava when a phylogeny was 
reconstructed using concatenated MLST nucleotide sequences 
(data not shown). With the use of the rplF fragment alleles, an 
isolate identified previously as N. sicca with rplF fragment allele 58 
was clustered with N. suhflava. With the use of concatenated 
MLST sequences, this isolate also clustered with N. suhflava, sup- 
porting the species designation identified by the rplF assay. 

A total of 65 unique alleles of the rplF fragment were identified 
among 926 isolates present in the PubMLST Neisseria database at 
the time of analysis. Each allele was assigned to a genospecies as 
described previously (Table l).N. mucosa, N. sicca, and N. maca- 
cae are now considered one species {N. mucosa), as they clustered 
as one group using rMLST (8). These organisms exhibit either 
indistinguishable (2) or highly similar 16S rRNA sequences (21). 
N. flavescens is now considered to be the same species as N. suh- 
flava, as these two species were indistinguishable using rMLST (8). 
The rplF fragment alleles were specific for each species group, 
except for allele 21, which was present in N. mucosa as well as a 
species previously defined as "Neisseria mucosa var. heidelhergen- 
sis" (22), now renamed N. oralis (23). Among WGS data for 804 JV. 
meningitidis and 17 N. gonorrhoeae isolates, there were 6 and 2 
unique rplF fragment alleles, respectively. The rplF fragment al- 
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FIG 1 Evolutionary relationships among 44 Neisseria species based on concatenated sequences from 53 whole ribosomal protein genes and single rpW gene 
fragments, (a) Concatenated sequences from 53 ribosomal protein genes, (b) rp/P gene fragments. Type strains of previous species: *,N.flavescefts; **,N. macacae. 
The percentages of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) are shown next to the branches. The unit 
of measure for the scale bars is the number of nucleotide substitutions per site. 



leles from N. polysaccharea and N. meningitidis, the two species 
most closely related to the type species N. gonorrhoeae (4), were 
most similar to N. gonorrhoeae allele 7, with 10 and 12 nucleotide 
differences, respectively. Fixed nucleotide sequence differences 
were present among all species groups examined, with N. polysac- 
charea and N. meningitidis alleles having four and seven fixed dif- 
ferences, respectively, from allele 7. Although the sequences from 
N. polysaccharea and N. meningitidis were similar, there were 15 
polymorphisms and 5 fixed differences that differentiated these 
two species. Compared to allele 7, the rplF fragment alleles from 
the other species of Neisseria were more distantly related, with the 
allele from a novel Neisseria species (isolate CCUG 21444), origi- 
nally defined as N. cinerea, having 120 nucleotide differences. 

Comparison with 16S rRNA species identification. Compar- 
ison of a phylogeny reconstructed from the rplF fragment alleles 



from Neisseria type strains with a phylogeny reconstructed using 
16S rRNA gene allele fragments (5) demonstrated improved res- 
olution of members of the genus achieved with the rplF fragment 
phylogeny. Species relationships determined using rplF fragment 
alleles were more consistent with rMLST species identification 
and DNA-DNA hybridization studies (24) than relationships in- 
ferred from 16S rRNA gene phylogenies (Fig. 2). The rplF frag- 
ment allele phylogeny also clustered the more closely related spe- 
cies that are often found in the human oropharynx separately 
from the more distantly related species that are not associated with 
humans. A search of the PubMLST Neisseria database also re- 
vealed that some 16S rRNA gene sequences are present in both 
commensals and meningococci. For example, 16S rRNA gene 
fragment allele 5, originally identified in isolates belonging to the 
species N. polysaccharea and N. cinerea, including the type strain of 
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TABLE 1 Species associations of rplF fragment alleles among 926 
Neisseria isolates 











No. of 


No. of 




No. of 


No. of 




polymorphic 


fixed 


Species 


isolates 


alleles 


Allele{s)'' 


sites 


differences 


N. gonorrhoeae 


17 


2 


5, 7 


1 


1 


N. polysaccharea 


12 


3 


9, 39, 44 


10 


4 


N. meningitidis 


804 


6 


1,2,3,4, 8,18 


12 


7 


N. oralis 


4 


4 


21,26, 36, 68 


49 


42 


N. elongata 


4 


4 


15, 37, 60, 75 


50 


45 


.A/^. ijergeri 


I 




16 


63 
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.A/^ lactamtca 


j5 




, a-f 33 riA ri r-y 


65 


56 


.A/, cinerea 


g 


5 


in IQ ?n 45 74 

lU, ij, ^-J, 


70 


51 


1\. jUUJlUyLi 


34 


14 


11 1 9 ^■^ 9S 31 
11, 1^, j^.?, ^-j, .^1, 

38, 40, 42, 43, 

53, 56, 57, 58, 

59 


72 


49 


N. animalis 


1 


1 


76 


72 


72 


N. mucosa^ 


16 


12 


13, 14, 17,21,22, 
27, 28, 29, 30, 
35,41, 54 


76 


36 


N. dentiae 


1 


1 


47 


82 


82 


N. cams 


1 


1 


48 


102 


102 


N. wadsworthii 


1 


1 


77 


102 


102 


N. weaveri 


1 


1 


24 


102 


102 


N. bacilUformis 


4 


2 


46, 49 


115 


111 


N. shayeganii 


1 


1 


78 


116 


116 


Neisseria sp. 


1 


1 


50 


120 


120 



(novel) 

" Strain originally defined as N. polysaccharea (20), but rMLST shows that it is a distinct 
novel species (4) which has yet to be validly published. 
^ Includes the previous species N. flavescens. 

Includes the previous species N. sicca and N. macacac. 
^ AH alleles are compared to allele 7 from type species iV. gonorrhoeae. 

N. cinerea, was harbored by three pathogenic serogroup W, me- 
ningococcal isolates. Allele 46 has also been found in both an N. 
polysaccharea isolate and a serogroup B, invasive meningococcus. 

Identifying Neisseria rplF fragment alleles using the 
PubMLST Neisseria database. To identify a species using an rplF 
fragment, the PubMLST Neisseria database can be queried using 
the sequence query interface. Users should choose "rplF species" 
and then paste in their nucleotide sequence. If there is an exact 
match, an rplF genospecies designation is returned. If there are 
polymorphisms present, the closest match is shown and any nu- 
cleotide differences are identified and shown in an alignment, 
which can then be translated. AH known rplF fragment alleles can 
be downloaded from the Neisseria locus/sequence definitions da- 
tabase in PubMLST, as can the rplF profiles. The Isolate database 
can also be searched for any related provenance data. In order to 
assign a new allele, novel rplF sequences can be submitted via 
PubMLST and a curator will then assign a provisional species 
identity by comparing the percentage identity to known species- 
specific alleles within the database and reconstructing a phylogeny 
using all known rplF fragment alleles and the novel allele. 

DISCUSSION 

The human body hosts a complex microbiota that is important in 
both health and disease (25). In the case of the genus Neisseria, for 
example, a variety of species colonize the mouth and oropharynx, 
with co-colonization providing a reservoir for horizontal genetic 
exchange (26). While most Neisseria species are harmless com- 
mensals, the meningococci and gonococci are important patho- 
gens, and understanding the transition from commensal to patho- 
gen is important in understanding their disease epidemiology 



(27). Phenotypic characteristics, such as nutritional requirements 
and biochemical tests, have provided the basis of diagnostic mi- 
crobiology for many years; however, there are limitations with 
these methods and the results obtained can be ambiguous, with N. 
cinerea isolates, for example, being misidentified as gonococci (28, 
29). Misidentification of Neisseria can have serious medicolegal 
consequences (28), as well as distorting the results of epidemio- 
logical studies. 

Molecular techniques have increasingly replaced phenotypic 
approaches for characterizing commensal and pathogenic bacte- 
ria, with the sequencing of 16S rRNA gene fragments widely em- 
ployed in diagnostic applications and studies of the microbiome 
(25, 30, 31). Limitations of this target, due to the similarity of 16S 
rRNA genes present in different species, are exemplified by Neis- 
seria. For example, there are indistinguishable 16S rRNA gene 
sequences in N. polysaccharea, N. cinerea, and the meningococci, 
and some meningococci contain a 16S rRNA gene sequence iden- 
tical to that found in gonococci (6). Further, public 16S rRNA 
databases, such as the Human Oral Microbiome database (32) and 
the EzTaxon-e database (3), can provide misleading results. The 
closest match to the 16S rRNA gene sequence from N. lactamica 
020-06 (33) in both databases is a meningococcal sequence. 

A variety of other approaches have been investigated to address 
this problem, for example, the phylogenetic analysis of the nucle- 
otide sequences of the seven MLST loci, sometimes referred to as 
multilocus sequence analysis (MLSA) (34). This approach was 
very effective in distinguishing N. meningitidis, N. gonorrhoeae, 
and N. lactamica (19) but did not group all members of the genus 
into species-specific clusters (4). Another method with promise is 
matrix-assisted laser desorption-ionization time of flight mass 
spectrometry (MALDI-TOF); however, this method requires op- 
timization, as it has been shown only to separate Neisseria into 
three groups, N. meningitis, N. gonorrhoeae, and other species 
(35). The availability of rapid and inexpensive whole-genome se- 
quencing and the gene-by-gene approach (36), as implemented in 
the BIGSdb software (9), has allowed techniques to be developed 
such as rMLST, which unambiguously identifies species and accu- 
rately determines relationships among Neisseria species (4, 7); 
however, rMLST requires WGS data or the analysis of multiple 
sequences which, while definitive, is not necessarily feasible or 
cost-effective for clinical specimens. 

A short (413-bp) fragment of the rplF gene which encodes the 
SOS ribosomal protein L6 was found to be a suitable genetic target 
for rapid differentiation within Neisseria species, as phylogenies 
reconstructed from rplF fragment alleles were consistent with a 
phylogeny reconstructed from the concatenated sequences of 53 
whole-ribosomal protein genes. The rplF gene variable region is 
flanked by conserved regions, a characteristic that enables this 
fragment to be sequenced on both DNA strands with two primers. 
Among 65 distinct alleles of this gene fragment identified among 
926 isolates, none were shared among commensals and pathogens 
or between the meningococci and the gonococci, confirming the 
suitability of the rplF fragment assay in differentiating pathogenic 
and commensal Neisseria species. Only one fragment allele [rplF 
21) was found in more than one species (N. oralis and N. mucosa), 
neither of which have been known to cause disease. Although the 
sequence clusters obtained with the rplF fragment alleles were the 
same as those obtained with concatenated ribosomal protein gene 
sequences, the phylogeny reconstructed from them was not iden- 
tical. Consequently, this single genetic target should not be used 
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• CCUG 17913 N. subflava* 
96 1- • CCUG 23930' N. subflava 
— * FA1090 N. gonorrhoeae 
5 J — * FAM18 N. meningitidis 

• ATCC 43768' N. polysaccharea 

A CCUG 808' W. animalis 

CCUG 53898' N. dentiae 

CCUG 4007' N. weaveri 



- A CCUG 56775' W. canis 
~ 9715' W. wadsworthii 

A 871' W. shayeganii 

A CCUG 50858' N. bacilliformis 



FIG 2 Evolutionary relationships among Neisseria species based on fragments from 16S rRNA and rplF genes, (a) 16S rRNA gene fragments, (b) rplF gene 
fragments. ^, type strain. Type strains of previous species: *, N. flavescens; **, N. macacae; ***, N. sicca; AT. mucosa var. heidelbergensis. The percentages of 
replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) are shown next to the branches. The unit of measure for the 
scale bars is the number of nucleotide substitutions per site. 
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on its own to define a species or used as a replacement for rMLST. 
The rplF assay is, however, a practical, rapid and inexpensive sin- 
gle-locus tool to differentiate among species within the genus 
Neisseria which can be combined with additional single-locus 
tests, such as porA sequencing (37) and capsule gene sequencing 
(38), for example, to confirm meningococcal identity. The assay 
was specifically developed to identify Neisseria species as part of 
the MenAfriCar study and has been successfully used to charac- 
terize thousands of samples from heat-kiUed cell suspensions, as- 
sisting in determining the impact of serogroup A polysaccharide 
conjugate vaccines on meningococcal carriage (1, 39). 

The rplF fragment allele sequences and associated metadata are 
stored in the PubMLST Neisseria database. It is curated and con- 
tinually updated, providing an extensive library of genomes and 
DNA sequences along with the tools to analyze these data. Al- 
though the majority of the isolates are meningococci, it contains a 
number of representative strains from most species, including cul- 
ture collection strains, as well as isolates from population studies, 
which can be used to query sequences to provide a species identity. 
While the rplF gene fragment assay is specific for Neisseria, the 
general approach can be adapted to identify other bacterial spe- 
cies, as the rp genes are universal (7). However, the rplF gene 
fragment assay has not, at the time of this writing, been adapted to 
identify species within other genera. In addition to species identi- 
fication, ribosomal genes have potential applications in the inves- 
tigation of noncultured samples and in metagenomic studies, 
where resolution finer than that provided by the 16S rRNA gene is 
required. (20). 
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